0.1 Background

0.2 Handel Polyclonal infections

Create two objects containing the list of samples that are monoclonals and polyclonals

monoclonals = Pviv_rGenome_filtered@metadata %>% 
  filter(Clonality %in% c('Monoclonal','Highly related Polyclonal')) %>%
  select(Sample_id) %>% unlist()

polyclonals = Pviv_rGenome_filtered@metadata %>% 
  filter(Clonality %in% c('Polyclonal')) %>%
  select(Sample_id) %>% unlist()

Create a new rGenome object that only has the information of monoclonals samples

Pviv_rGenome_object_mono = filter_samples(
      Pviv_rGenome_filtered , 
      v = Pviv_rGenome_filtered@metadata$Sample_id %in% monoclonals)

Calculate the expected heterozygosity assuming that all samples are diploids

Pviv_rGenome_filtered@loci_table$ExpHet_all_as_diploid = 
  get_ExpHet(obj = Pviv_rGenome_filtered, update_AC = T, monoclonals = NULL, polyclonals = c(monoclonals, polyclonals))

Calculate the expected heterozygosity considering that all samples are monoclonals, it means that secondary alleles won’t be included

Pviv_rGenome_filtered@loci_table$ExpHet_all_as_haploid = 
  get_ExpHet(obj = Pviv_rGenome_filtered, update_AC = T, monoclonals = c(monoclonals, polyclonals), polyclonals = NULL)

Calculate the expected heterozygosity considering only monoclonal samples

Pviv_rGenome_filtered@loci_table$ExpHet_only_monoclonals = 
  get_ExpHet(
    Pviv_rGenome_object_mono, 
    update_AC = T, 
    monoclonals = monoclonals)

Calculate the expected heterozygosity defining the ploidy of each position individually

Pviv_rGenome_filtered@loci_table$ExpHet_default = get_ExpHet(obj = Pviv_rGenome_filtered)

Calculate the expected heterozygosity defining the ploidy of each samples based on our previous classification. It means that in monoclonal samples secondary alleles will be removed, while in polyclonal samples all observed alleles will be included

Pviv_rGenome_filtered@loci_table$ExpHet_default2 = 
  get_ExpHet(obj = Pviv_rGenome_filtered, update_AC = T, monoclonals = monoclonals, polyclonals = polyclonals)

Create a plot of ExpHet using the different approaches

Pviv_rGenome_filtered@loci_table %>%
  pivot_longer(cols = c('ExpHet_all_as_haploid', 'ExpHet_only_monoclonals', 'ExpHet_default', 'ExpHet_default2'), names_to = 'Treatment', values_to = 'ExpHet')%>%
  ggplot(aes(x = ExpHet_all_as_diploid, y = ExpHet))+
  geom_point(alpha = .1)+
  facet_grid(.~Treatment)+
  theme_bw()
Figure 1: Expected heterozygosity. Panel 1 (All samples as haploid) includes monoclonal and polyclonal infections and assumes that only one clone is present in each sample (only the primary allele is taken in polyclonal infections). Panle 2 (Default) discriminate monoclonal and polyclonal infections individualy for each polymorphic site, thus the sample size (number of alleles in the population) for each site is diferent. In Panel 3 (Default 2), for monoclonal samples only the primary allele is considered and all secondary alleles are masked, while polyclonal samples are handeled as diploids. In the fourth panel only the primary alleles of monoclonal infections are used. The Expected Heterozygosity considering that al samples are polyclonals (all samples are diploid as in vcftools) was used as control. Monoclonals and highly related polyclonal samples were cosidered as as monoclonal infections.

Figure 1: Expected heterozygosity. Panel 1 (All samples as haploid) includes monoclonal and polyclonal infections and assumes that only one clone is present in each sample (only the primary allele is taken in polyclonal infections). Panle 2 (Default) discriminate monoclonal and polyclonal infections individualy for each polymorphic site, thus the sample size (number of alleles in the population) for each site is diferent. In Panel 3 (Default 2), for monoclonal samples only the primary allele is considered and all secondary alleles are masked, while polyclonal samples are handeled as diploids. In the fourth panel only the primary alleles of monoclonal infections are used. The Expected Heterozygosity considering that al samples are polyclonals (all samples are diploid as in vcftools) was used as control. Monoclonals and highly related polyclonal samples were cosidered as as monoclonal infections.

Create a plot of the distribution of ExpHet by by type of marker

Pviv_rGenome_filtered@loci_table %>%
  pivot_longer(cols = c('ExpHet_all_as_diploid','ExpHet_all_as_haploid', 'ExpHet_only_monoclonals', 'ExpHet_default', 'ExpHet_default2'), names_to = 'Treatment', values_to = 'ExpHet')%>%
  ggplot(aes(fill = Treatment, x = ExpHet))+
  geom_histogram(alpha = .4)+
  facet_grid(TypeOf_Markers~Treatment, scales = 'free_y')+
  theme_bw()
Figure 2: Distribution of expected heterozygosity by type of polymorphims.

Figure 2: Distribution of expected heterozygosity by type of polymorphims.

Calculate Heterozygosity by each population (Subnational_level0: Regions within Colombia)

ExpHet_by_Pop= 
  get_ExpHet(obj = Pviv_rGenome_filtered, update_AC = T, monoclonals = monoclonals, polyclonals = polyclonals, by = 'Country')
ExpHet_by_Pop %>%
  pivot_longer(cols = c('Perú', 'Venezuela'), names_to = 'Region', values_to = 'ExpHet')%>%
  ggplot(aes(fill = Region, x = ExpHet))+
  geom_histogram(alpha = .4)+
  facet_grid(TypeOf_Markers~Region, scales = 'free_y')+
  theme_bw()
Figure 3: Distribution of Expected Heterozygosity by type of polymorphism and country. Expected Heterozygosity was calculated using only the primary allele of monoclonal infections and handeling polyclonal infections as diploids.

Figure 3: Distribution of Expected Heterozygosity by type of polymorphism and country. Expected Heterozygosity was calculated using only the primary allele of monoclonal infections and handeling polyclonal infections as diploids.

0.3 Nucleotide diversity

Calculate Nuc. Div. by gene by Population (Subnational_level0)

pi_by_gene_by_country = get_nuc_div(Pviv_rGenome_filtered, monoclonals = monoclonals, polyclonals = polyclonals, gff = '../reference/genes.gff', type_of_region = 'gene', window = NULL, by = 'Country')
mhp_pi_by_gene_by_country = pi_by_gene_by_country %>%
  pivot_longer(cols = c('Perú_pi', 'Venezuela_pi', 'Total_pi'), names_to = 'Regions', values_to = 'pi')%>%
  mutate(CHROM2  = gsub('(^(\\d|P|v)+_|_v1)', '',seqid))%>%
  ggplot(aes(x = start, y = pi, color = Regions)) +
    geom_point(alpha = 0.5, size = .25) +
    scale_color_manual(values = c('dodgerblue3', 'firebrick3', 'gray40'))+
    facet_grid(Regions ~ CHROM2, scales = 'free_x')+
    theme_minimal()+
    labs(y = 'Nuc. Div.', x = 'Chromosomal position', color = 'Region')+
    theme(legend.position = 'right',
          axis.text.x = element_blank(),
          axis.title.x = element_blank())
mhp_pi_by_gene_by_country = ggplotly(mhp_pi_by_gene_by_country)

chromosomes = c('PvP01_01_v1', 'PvP01_02_v1', 'PvP01_03_v1', 'PvP01_04_v1',
                'PvP01_05_v1', 'PvP01_06_v1', 'PvP01_07_v1', 'PvP01_08_v1',
                'PvP01_09_v1', 'PvP01_10_v1', 'PvP01_11_v1', 'PvP01_12_v1',
                'PvP01_13_v1', 'PvP01_14_v1', 'PvP01_API_v1', 'PvP01_MIT_v1')

for(region in 1:3){
  for(chrom in 1:16){
    mhp_pi_by_gene_by_country$x$data[[chrom + (region - 1)*16]]$text = paste(mhp_pi_by_gene_by_country$x$data[[chrom + (region - 1)*16]]$text,
      pi_by_gene_by_country[pi_by_gene_by_country$seqid == chromosomes[chrom],]$gene_description, sep = '<br />')
  }
}


mhp_pi_by_gene_by_country

Figure 4: Nucleotide diversity by gene and Country. The claculus was done using only the primary allele of monoclonal infections and handeling polyclonal infections as diploids.

0.4 Genetic differentiation

pca_PerVen = fastGRM(obj = Pviv_rGenome_filtered, k = 2, monoclonals = monoclonals, polyclonals = polyclonals, Pop = 'Country')

pca_PerVen  %>% ggplot(aes(x = PC1, y = PC2, color = Country))+
  geom_point(alpha = .7, size = 2) +
  stat_ellipse(level = .6)+
  scale_color_manual(values = c('dodgerblue3', 'firebrick3'))+
  theme_bw()+
  labs(title = 'WGS - PerVen',
       color = 'Regions')
Figure 5: Principal component analysis. Colors are assigned by country.

Figure 5: Principal component analysis. Colors are assigned by country.

Genetic Differentiation within Peru

Pviv_rGenome_object_Peru = filter_samples(Pviv_rGenome_filtered, v = 
                                          Pviv_rGenome_filtered@metadata$Country == 'Perú')

monoclonals_per = Pviv_rGenome_object_Peru@metadata[Pviv_rGenome_object_Peru@metadata$Clonality %in% c('Monoclonbal',"Highly related Polyclonal"),][['Sample_id']]

polyclonals_per = Pviv_rGenome_object_Peru@metadata[Pviv_rGenome_object_Peru@metadata$Clonality == "Polyclonal",][['Sample_id']]

pca_Per = fastGRM(obj = Pviv_rGenome_object_Peru, k = 2, monoclonals = monoclonals_per, polyclonals = polyclonals_per, Pop = 'Subnational_level0')


pca_Per  %>% ggplot(aes(x = PC1, y = PC2, color = Subnational_level0))+
  geom_point(alpha = .7, size = 2) +
  theme_bw()+
  labs(title = 'WGS - Per',
       color = 'Regions')
Figure 6: Principal component analysis of peruvian samples. Colors are assigned by locality.

Figure 6: Principal component analysis of peruvian samples. Colors are assigned by locality.

Genetic Differentiation within Peru

Pviv_rGenome_object_Ven = filter_samples(Pviv_rGenome_filtered, v = 
                                          Pviv_rGenome_filtered@metadata$Country == 'Venezuela')

monoclonals_ven = Pviv_rGenome_object_Ven@metadata[Pviv_rGenome_object_Ven@metadata$Clonality %in% c('Monoclonbal',"Highly related Polyclonal"),][['Sample_id']]

polyclonals_ven = Pviv_rGenome_object_Ven@metadata[Pviv_rGenome_object_Ven@metadata$Clonality == "Polyclonal",][['Sample_id']]

pca_Ven = fastGRM(obj = Pviv_rGenome_object_Ven, k = 2, monoclonals = monoclonals_ven, polyclonals = polyclonals_ven, Pop = 'Subnational_level2')

pca_Ven  %>% ggplot(aes(x = PC1, y = PC2, color = Subnational_level2))+
  geom_point(alpha = .7, size = 2) +
  stat_ellipse(level = .6)+
  theme_bw()+
  labs(title = 'WGS - Ven',
       color = 'Regions')
Figure 7: Principal component analysis of samples from Venezuela. Colors are assigned by Subnational level 2 Municipality).

Figure 7: Principal component analysis of samples from Venezuela. Colors are assigned by Subnational level 2 Municipality).

0.5 Drug Resistance Surveillance

drug_haplotypes = get_haplotypes_respect_to_reference(obj = Pviv_rGenome_filtered,
                                    gene_ids = c('PVP01_1010900','PVP01_0203000'),
                                    gene_labels = c('pvmdr1','pvmrp1'),
                                    gff_file = "../reference/genes.gff",
                                    fasta_file = "../reference/PvP01.v1.fasta",
                                    monoclonals = monoclonals,
                                    polyclonals = polyclonals,
                                    variables = c('Country', 'Year_of_Collection'))
## use default substitution matrix
## use default substitution matrix
drug_haplotypes$haplo_freq_plot+
  #theme_bw()+
  theme(axis.text = element_text(size = 5),
        legend.text = element_text(size = 5))+
  guides(fill=guide_legend(ncol=1))
Figure 8: Haplotype frquency of genes MDR1 and MRP1 by country

Figure 8: Haplotype frquency of genes MDR1 and MRP1 by country